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Abstract 

The purpose of this study is two-fold; the first aim being to show the effect of outliers on the widely used least 
squares regression estimator in social sciences. The second aim is to compare the classical method of least 
squares with the robust M-estimator using the “determination of coefficient” (R 2 ). For this purpose, analyzes 
were performed on three data sets. The first set of data is hypothetical, consisting of 15 students’ general 
mathematic and linear algebra final scores. The second set of data was collected from 231 adolescents at¬ 
tending different high schools in Turkey. The data were collected using the Scale of Aggressiveness, Academic 
Self-efficacy Scale, Scale of Peer Pressure, and Trait Anxiety Inventory. The third set of data was collected from 
1,385 high school students. This data were collected using the Maslach Burnout Inventory-Students Survey, 
Coping Styles of Stress Scale, Test Anxiety Inventory, Adolescence Self-Efficacy Scale, and Parental Attitude 
Scale. It was seen that, comparisons with small, medium and large volume samples, especially for the data sets 
including outlier/outliers, R 2 in M estimate is better alternatives than those having least squares. The findings 
are discussed in light of the recommendations presented in the literature. 
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In scientific research projects, finding a relationship 
between two or more variables and then expressing 
it in a mathematical equation is an important 
dimension needed in order to make future 
predictions. This mathematical relationship does 
not only refer to functional relationship, but also 
shows that one of the variables of a predetermined 
value provides estimation of the other. 

The method that permits one to depict the 
relationship between variables in an equation is 
called “regression analysis,” a method which has 
applications in almost every field (Ariel, 1991). 


Regression analysis has an important role in 
scientific research projects because it allows a 
researcher to predict the future, which is one 
of the most important missions of science. In 
fact, regression analysis may be the most widely 
used statistical technique (Biiyukozturk, 2005; 
Buyiikoztiirk, (Jokluk, & Koklii, 2011). 

In general, the simple linear regression model is: 
y = (3 0 + (5 1 x + e (1) 

where the intercept |3 0 and the slope (3 X are unknown 
constants and where e is a random error component. 
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The errors are assumed to have a mean of 0 and a 
variance of cr 2 . Additionally, it is usually assumed 
that the errors are uncorrelated. Customarily, x 
is called the independent variable, and y is called 
the dependent variable. The dependent variable, 
y, is a function of the independent variable, x 
(Buyukoztiirk, 2005). 

A regression model that involves more than one 
regressor variable is called a multiple regression 
model. The model: 

y=Po + P, X l + P2 X 2 + - + PA + e (2) 

is called a multiple linear regression model with 
k regressor. The parameters p.,j = 1, ... , and k are 
called regression coefficients (Draper & Smith, 
1998; Montgomery, Peck, & Vining, 2001). 

It is more convenient to deal with multiple 
regression models if they are expressed in matrix 
notation. This allows for a very compact display of 
the model, data, and results. In matrix notation, the 
model given by Eq. (2) is: 

y = Xb + e (3) 

where 



V 


1 X 11 x 12 X lk 


V 


V 

y = 

y 2 

,x = 

1 x 21 X 2 ••• x, k 

.p= 

ft 

/ £ = 

Ej 


,y». 


1 X B j X„J ••• x* 


.ftj 




In general, y is an n x 1 vector of the observations, 
X is an n x p matrix of the levels of the regressor 
variables, b is an p x 1 vector of the regression 
coefficients, and £ is an n x 1 vector of random 
errors (Montgomery et al., 2001). The major 
assumptions made thus far in the present study of 
regression analysis are as follows: 

i) The relationship between the response y and the 
regressors is linear, at least approximately. 

ii) The errors term £ has a zero mean. 

iii) The errors term £ has a constant variance a 2 . 

iv) The errors are uncorrelated. 

v) The errors are normally distributed. 

Assumption (v) is required to test a hypothesis and to 
estimate intervals. Assumptions (iv) and (v) imply that 
the errors are independent random variables (Draper 
& Smith, 1998; Montgomery et al., 2001). 

Least squares estimation is widely used to estimate 
unknown parameters in regression analysis. The 
least squares function is: min £e* where e = y - Xp. 
This function must be minimized with respect to p. 
Thus, the least squares estimator of P is: 


A 


P = (X’X) 'X’y 

(5) 

and the fitted regression model is: 

A A A A A A 

y = xp = p 0 + Pi x i + P 2 x 2 + ■■■ + PA 

(6) 

Standardized residuals for least squares: 


r,= * 

(7) 

a 


where e. = y. - y. and: 


6 2 = 1 Xef 

(8) 

n-p i= i 



Errors have a zero mean, a standard error of s, and 
are identically distributed. Therefore, <3 2 is an 
unbiased estimator for a 2 (Birkes & Dodge, 1993; 
Chatterjee, Hadi, & Price, 2000; Draper & Smith, 
1998; Montgomery et al., 2001). 

It is often assumed in the social sciences that data 
conform to a normal distribution. The least squares 
method is a suitable method and has good statistical 
properties when the data are normally distributed. 
However, in the case of deviations from normality, 
the least squares method is not an effective estimator. 
In this situation, robust estimators can be a suitable 
alternative method (Arslan & Billor, 2000). 

Robust statistics refers to the stability theory of 
statistical procedures. It systematically investigates 
the effects of deviations from modeling assumptions 
on known procedures and, if necessary, develops 
new, better procedures. Common modeling 
assumptions are those of normality and of 
independence of random errors. 

The implicit or explicit hope that under approximate 
(instead of exact) normality the least squares method 
would still be approximately optimal was thwarted 
by Tukey (1960). Soon after Tukeys (1960) inspiring 
paper, the foundations for four closely related 
robustness theories were laid by Huber (1964; 1965), 
Hample (1968), and Rousseeuw (1984). 

As has been mentioned above, one of the main 
problems of regression analysis is outliers; that 
is, observations far from the bulk of the data. The 
main target of robust statistical methods is to 
develop a method that will combat outliers. For 
this, in the social sciences, the least squares method 
is preferred to robust methods because of easy 
computation (Akta§, 2005; Alu^dibi & Ekici, 2012; 
Co§kuntuncel, 2005; Gundiiz & (Jelikkaleli, 2009; 
Gime§ & Tul^al, 2002; inandi, 2009; Rahman & 
Amri, 2011; §ahin & Anil, 2012). 

There are many procedures for robust regression 
estimation proposed in the literature. Among the most 
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commonly used method is the robust M-estimator. 
M-estimation is known as the classical robust regression 
estimator (Arslan, 1992,2004a, 2004b; Arslan & Billor, 
1996; Arslan, Edlund, & Ekblom, 2001; Belsley, Kuh, & 
Welsch, 1980; Hample, Ronchetti, Rousseeuw, & Stahel, 
1986; Huber, 1981; Rousseeuw, 1984; Rousseeuw & 
Leroy, 1987; Rousseeuw & Yohai, 1984; Rousseeuw & 
Zomeren, 1990). 

The M-estimator for the unknown coefficient p 
given in Eq. (3) is: 

n n 

min^p( e i ) = min Xp(y i -x;P) W 

i=l i=l 

where p(e) is a function that satisfies the following 
conditions: 

0 p(e) > 0 ii) p(0) = 0 iii) p(e) = p(~e) iv)Je.| > |e.|, i 
^) while r(e.) > p(e.), e. = y. - X ; b 

The two most widely used p functions are the Huber 
and Tukey r functions. The Huber p function is: 

fe 2 / 2 ,-k < e < k 

P ( e ) = \ , 

[k | e | -(k / 2) , d.y. 

where k = 1,345, and the Tukey p function is: 



1 


= (X'WX)- 1 X’Wy 


(ID 


where W = diag(w l5 ..., w n ) is a diagonal matrix. 
Here, the weight function is a bounded function 
of the residuals so that the observations with large 
residuals will receive smaller weights and will hence 
exert less of an effect on the estimator (Birkes & 
Dodge, 1993; Co§kuntuncel, 2009). 


In order to assess the quality of the fit in multiple 
linear regression, the coefficient of determination, 
or R 2 , is a very simple tool, yet the most used 
by practitioners. Indeed, it is reported in most 
statistical analyzes, and although it is not 
recommended as a final model selection tool, it 
does provide an indication of the suitability of 
the chosen explanatory variables in predicting 
the response. R 2 is often understood to be the 
proportion of variation explained by the regressor, 
x. For the least squares coefficient of determination 
may be computed by the ANOVA table given in 
Table 1 (Montgomery et al., 2001). 


The quantity for the coefficient of determination is: 


P (e) = p^J^^t'T 1 -^ 2 ] 3 )' 161 " 0 
c 2 /6 , | e |> c 

where c = 5 or 6 (Hample et al., 1986; Huber, 1981; 
Maronna, Martin, & Yohai, 2006; Rousseeuw & 
Leroy, 1987). In this study, the Tukey p function 
has been used. Since the Tukey p function is a 
differentiable function, the researcher has obtained 
the following estimating equation after setting the 
derivative of Eq. (9) with respect to |3 to 0: 


v p'(e.) n 

X——- e i x i =0. 


( 10 ) 


where e. ^ 0. Further, ifw. = p’(e.)/e. then the following 
weighted form of the estimator for (3 is obtained: 


R 2 = 


kt r _ 1 kt h 

KT t KT t 


( 12 ) 


Since 0 < KT H < KT^ it follows that 0 < R 2 < 1. The 
values of R 2 that are close to 1 imply that most of the 
variability in y is explained by the regression model. 


In the classical setting, it is well known that the 
least-squares fit and the coefficient of determination 
may be arbitrary and/or misleading in the presence 
of a single outlier. In many applied settings, both 
the assumption of normality of the errors and the 
absence of outliers are difficult to establish. In these 
cases, Renaud and Feser (2010) have suggested that 
robust coefficient of determination (R w ). R w is: 


Table 1. 

Analysis of Variance 

(ANOVA) Table 




Source of Variation 

Sum of Squares (KT) 

Degrees of Freedom 

Mean Square (KO) 

F 

Regression 

n 

kt »= X(y.-y) 2 

i=l 

K 

KO r - KT R / k 

ko r /ko h 

Error 

n 

kt h= I>.-y.) 2 

i=l 

n - k - 1 

KO H = KT H /(n-k- 1) 


Total 

n 

kt t= X(y.-y) 2 

i=l 

n - 1 
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R~ = 


S w i(yi-yw)(yi-yw) 


S w i (yi - y w ) 2 S Wj (^ - y w ) 2 


(13) 


where y w = (l/Zw.)Ew.y., y w = (l/Zw.)Sw.y. } w. and 
y. are weights and the fitted value of the robust 
M-estimator. The total sum of squares form of 

(R 2 w ) is: 


Xw^Yi-Yi ) 2 

Ri = 1—^- 


X w iCy.-y.) 2 

i=l 

and (R^) = (R 2 ) (Renaud & Feser, 2010). 


(14) 


Method 

In this study, both simple and multiple linear 
regression methods were performed on various 
data sets. The classical method of least squares and 
the Robust M-regression estimator are compared 
with respect to the coefficient of determination. 


The Research Data 

In this study, three sets of data have been studied. 
The first set of data is hypothetical and contains 
15 randomly selected student’ final grades in 
Linear algebra and General Mathematics from 
Mersin University’s Education Faculty. For the 
data collection simple linear regression was used. 
Whether the success of general mathematics 
predicted the success of linear algebra was 
investigated. 

The second and third sets of data consist of real 
data. These data were analyzed using the least 
squares method and published as two papers before 
this study. The first of them has been carried out by 
Giinduz and Qelikkaleli (2009). This study aimed to 
analyze the prediction of male and female students’ 
levels of aggressiveness in terms of the following 
variables: belief of academic efficacy, peer pressure, 
and anxiety. The research group consisted of 231 
high school students (129 females; 102 males) aged 
between 14 and 19. In the research project, the 
Aggressiveness Scale, Academic Self-Efficacy Scale, 
Peer Pressure Scale, and Anxiety Inventory were 
used as measurement devices. 

The second set of data was derived from 
(Japulcuoglu (2012). This study is a descriptive 
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study aiming to examine the burnout level of 
students according to gender, grade level, school 
type, and perceived level of academic achievement; 
as well as to investigate the relationship between 
student burnout with factors such as coping with 
stress, test anxiety, academic self-efficacy, and 
parental attitudes. The study group consisted of 
1,385 high school students in various distics of 
the city of Mersin in Turkey during the 2010-2011 
academic year. The Maslach Burnout Inventory- 
Student Survey (MBI-SS) was used to measure 
students’ burnout levels; the Coping with Stress 
Styles Scale was used to measure the styles of used 
to cope with stress; the Test Anxiety Inventory 
was administered to gauge test anxiety level; the 
Adolescence Self-Efficacy Scale was applied to 
evaluate self-efficacy; the Parental Attitude Scale 
was used to measure parental attitudes; and the 
Personal Information Sheet was used to gather 
demographic data in the study. 


Data Analysis 

The data were analyzed using the R v2.15.1 program 
(Chambers, Eddy, Hardle, Sheather, & Tierney, 
2002; Delgaard, 2008; R Core Team, 2013; Wilcox, 
2005). The codes needed for (R^) are given below: 

ekk<-lsfit(x,y,intercept=T)#least squares 
estimation 

M_tah<-rreg(x, y, int=T, iter=100)#Robust 
M estimator 

w<-M_tah$w 

ysapka<-M_tah$fitted. values 

ywcizgi< - (1 /sum( w)) * sum(w*y) 

ywsapkacizgi<-(l/sum(w))*sum(w*ysapka) 

RKAREw<-((sum(w*(y-ywcizgi)*(ysapka- 

ywsapkacizgi)))/(sqrt(sum(w*(y- 

ywcizgi) A 2)*sum(w*(ysapka- 

ywsapkacizgi) A 2)))) A 2 

RKAREwtilda< -1 - (sum (w* (y-ysapka) A 2) / 
sum (w* (y-ywcizgi) A 2)) 


Results 

The Results of Hypothetical Data 

Table 2 shows the simple linear regression results 
for least squares and the M-estimator of both. 

As shown in Table 2, observation 11 holds zero 
weight in reference to the M-estimator and the 
least squares estimator is more different than the 
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Table 2. 

Least Squares and M-Estimator for Simple Linear Regression 


# 

GMat. (x) 

LCeb. (y) 

TeKK= 32 > 807 + 0,235x 

y M =12,224 + 0,572x 

Weights obtained by M-estimator (w) 

1 

80 

50 

51,59 

58,04 

0,96 

2 

67 

50 

48,54 

50,59 

1,00 

3 

57 

56 

46,19 

44,87 

0,92 

4 

35 

30 

41,02 

32,27 

0,99 

5 

80 

60 

51,59 

58,04 

1,00 

6 

60 

30 

46,89 

46,58 

0,82 

7 

80 

70 

51,59 

58,04 

0,90 

8 

82 

51 

52,06 

59,18 

0,96 

9 

70 

50 

49,24 

52,31 

1,00 

10 

72 

54 

49,71 

53,46 

1,00 

11 

90 

10 

53,94 

63,76 

0,00 

12 

85 

73 

52,76 

60,90 

0,89 

13 

60 

56 

46,89 

46,58 

0,94 

14 

84 

52 

52,53 

60,33 

0,96 

15 

71 

52 

49,47 

52,88 

1,00 


M-estimator. This means that there is a conflict 
between the M-estimator and this single outlier. 
Because the classical R 2 has a value of 0,044, 
whereas (R^) equals 0,466, it is observed that the 
effect of the outlier/outliers and the resistance of 
the M- estimate can be easily seen by excluding 
outlier/outliers from data. Table 3 shows the results 
of estimates with and without observation 11 while 
also depicting the effects of single outliers 

Aggressiveness Level Data 

For this data (Giindiiz & (Jelikkaleli, 2009), 
regression analysis was conducted separately for 
102 male and 129 female students. The results are 
shown in Table 4 for the 102 male students. 

If the weights obtained by the M-estimators are 
analyzed, then observations 52, 65, and 101 each 
hold near zero weight, whereas the others hold an 
approximate weight of 1. This means that these 3 
observations are outliers. The robust coefficient of 
determination is higher than the classical one, as 
expected. The same analysis performed on the male 
students was also performed on the 129 female 
students, the results of which are shown in Table 
5. Here, observations 9, 20, 69, 78, 79, 92, 95, and 


128 hold weights near zero and the others of about 
1. Again, the robust coefficient of determination is 
higher than the classical one, as is to be expected. 


Table 4. 

Comparison of Estimators for 

102 Male Students 


Independent 

LS 

Std. 

M-estimation 

Std. 

variables 

Estimation 

Err. 


Err. 

SABIT 

125,151 

12,53 

113,213 

11,85 

SK 

0,084 

0,23 

0,165 

0,22 

AB 

0,243 

0,07 

0,255 

0,06 

AYI 

-0,667 

0,26 

-0,429 

0,24 

R 2 

0,235 


0,280 



Table 5. 

Comparison of Estimators for 129 Female Students 


Independent 

variables 

LS 

Estimation 

Std. 

Err. 

M estimation 

Std. 

Err. 

SABIT 

99,058 

13,62 

103,188 

12,34 

SK 

0,801 

0,22 

0,756 

0,20 

AB 

0,234 

0,07 

0,174 

0,06 

AYI 

-0,960 

0,23 

-0,948 

0,20 

R 2 

0,297 


0,363 



Table 6 shows the results of estimates with and 
without observations for both male and female 
students. It should be noted that the least squares 


Table 3. 

Least Squares 

and M-estimate results with and without observation 11. 






Least Squares for 15 
observation 

M estimator for 15 
observation 

Least Squares without 
observation 11 

M-estimator without 
observation 11 

Term 

Estimate 

Std.Er. 

Estimate 

Std.Er. 

Estimate 

Std.Er. 

Estimate 

Std.Er. 

Constant 

32,807 

22,01 

12,224 

14,05 

11,257 

13,03 

11,984 

13,65 

GMat 

0,235 

0,30 

0,572 

0,19 

0,586 

0,18 

0,576 

0,19 

R 2 

0,044 


0,466 


0,461 


0,468 
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method without outliers fits approximately the 
same as M-estimates method based on full data. 

Normal Q-Q plots may provide a suitable approach for 
a researcher to detect outliers and to gauge goodness 
of fit. Realizing that Q-Q plots and other graphical 
techniques are highly subjective, more formal tests were 
required in order to identity a plausible distribution for 
the data as well as to identify outliers, inliers, and other 
data anomalies (Tiku & Akkaya, 2004). 


For the sub-dimension of exhaustion, of the 
1385, 85 weights are close to zero, and the rest 
1300 observations are close to 1. The M-estimate 
provides a better coefficient of determination value 
than the least squares method, despite outliers. 
Table 8 shows the results for the sub-dimension of 
desensitization. 


Table 6. 

Least Squares and M-Estimate Results without Outliers 



Results for 99 male students 


Results for 121 female students 

Independent variables 

LS Estimate Std. Err. 

M Estimate 

Std. Err. 

LS Estimate Std. Err. 

M Estimate 

Std. Err. 

CONSTANT 

114,250 

11,24 

115,305 

11,84 

110,743 

10,74 

110,126 

11,45 

SK 

0,142 

0,20 

0,154 

0,21 

0,610 

0,17 

0,613 

0,19 

AB 

0,269 

0,06 

0,255 

0,06 

0,156 

0,05 

0,158 

0,06 

AYI 

-0,481 

0,24 

-0,501 

0,25 

-0,953 

0,18 

-0,972 

0,19 

R 2 

0,301 


0,316 


0,342 


0,372 



Burnout Level Data 

For this data, regression analysis was conducted 
separately for MBI-SS’ sub-dimensions of exhaustion, 
desensitization, and self-efficacy. The results are 
shown in Table 7 for the sub-dimension of exhaustion. 


Table 7. 





Comparison of Estimators for the sub-Dimension of 


Exhaustion 





Independent 

LS 

Std. 

M-Estimate 

Std. 

Variables 

Estimate 

Err. 

Err. 

CONSTANT 

14,930 

1,22 

14,375 

1,18 

SELF-CONFIDENT 

APPROACH 

0,008 

0,03 

0,028 

0,03 

HELPLESS 

APPROACH 

0,063 

0,03 

0,050 

0,03 

SUBMISSIVE 

APPROACH 

0,137 

0,04 

0,144 

0,03 

OPTIMISTIC 

APPROACH 

-0,277 

0,04 

-0,279 

0,04 

SEEKING OF 

SOCIAL SUPPORT 

-0,082 

0,04 

-0,066 

0,04 

ACADEMIC 

COMPETENCY 

-0,213 

0,02 

-0,238 

0,02 

SOCIAL 

COMPETENCY 

0,047 

0,02 

0,039 

0,02 

EMOTIONAL 

COMPETENCY 

0,063 

0,02 

0,046 

0,02 

TEST ANXIETY 

0,043 

0,01 

0,039 

0,01 

DEMOCRATIC 

ATTITUDE 

-0,010 

0,01 

0,007 

0,01 

PROTECTIVE 

HEADREQUESTOR 

ATTITUDE 

0,029 

0,01 

0,033 

0,01 

AUTHORITARIAN 

ATTITUDE 

-0,017 

0,02 

-0,015 

0,02 

R 2 

0,239 


0,324 



Table 8. 





Comparison of Estimators for the sub-Dimension of 


Desensitization 





Independent 

LS 

Std. 

M-Estimate 

Std. 

Variables 

Estimate 

Err. 

Err. 

CONSTANT 

9,437 

1,02 

9,990 

0,95 

SELF-CONFIDENT 

APPROACH 

0,010 

0,03 

0,024 

0,03 

HELPLESS 

APPROACH 

0,018 

0,03 

0,015 

0,02 

SUBMISSIVE 

APPROACH 

0,184 

0,03 

0,163 

0,03 

OPTIMISTIC 

APPROACH 

-0,127 

0,03 

-0,138 

0,03 

SEEKING OF 

SOCIAL SUPPORT 

-0,080 

0,04 

-0,061 

0,03 

ACADEMIC 

COMPETENCY 

-0,135 

0,02 

-0,149 

0,02 

SOCIAL 

COMPETENCY 

0,018 

0,02 

-0,004 

0,02 

EMOTIONAL 

COMPETENCY 

0,046 

0,02 

0,030 

0,02 

TEST ANXIETY 

0,039 

0,01 

0,039 

0,01 

DEMOCRATIC 

ATTITUDE 

-0,016 

0,01 

-0,009 

0,01 

PROTECTIVE 

HEADREQUESTOR 

ATTITUDE 

-0,008 

0,01 

-0,010 

0,01 

AUTHORITARIAN 

ATTITUDE 

0,028 

0,02 

0,034 

0,02 

R 2 

0,203 


0,302 



Here, 72 of the observations hold weights of 
near zero, and again, the robust coefficient of 
determination is better than the classical one. Table 
9 shows the results for the sub-dimension of self- 
efficacy. 
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Table 9. 





Comparison of Estimators for the sub-Dimension of Self- 


efficacy 





Independent 

LS 

Std. 

M-Estimate 

Std. 

Variables 

Estimate 

Err. 

Err. 

CONSTANT 

4,139 

0,89 

3,743 

0,91 

SELF-CONFIDENT 

APPROACH 

0,080 

0,02 

0,083 

0,03 

HELPLESS 

APPROACH 

-0,041 

0,02 

-0,047 

0,02 

SUBMISSIVE 

APPROACH 

0,055 

0,03 

0,067 

0,03 

OPTIMISTIC 

APPROACH 

0,111 

0,03 

0,125 

0,03 

SEEKING OF 

SOCIAL SUPPORT 

-0,205 

0,03 

-0,223 

0,03 

ACADEMIC 

COMPETENCY 

0,171 

0,02 

0,178 

0,02 

SOCIAL 

COMPETENCY 

0,096 

0,02 

0,111 

0,02 

EMOTIONAL 

COMPETENCY 

0,003 

0,02 

-0,001 

0,02 

TEST ANXIETY 

-0,014 

0,01 

-0,014 

0,01 

DEMOCRATIC 

ATTITUDE 

0,024 

0,01 

0,023 

0,01 

PROTECTIVE 

HEADREQUESTOR 

ATTITUDE 

-0,005 

0,01 

-0,007 

0,01 

AUTHORITARIAN 

ATTITUDE 

0,022 

0,01 

0,022 

0,01 

R 2 

0,288 


0,344 



In contrast to the previous sub-dimensions, here, 
only eight observations have very low weight. The 
coefficient of determination for the least square 
method was affected too much by theese outliers. 


settings, both the assumption of normality of the 
errors and the absence of outliers are difficult to 
establish. In these cases, robust procedures for 
estimation and inference in linear regression are 
available thereby providing a suitable alternative 
(Renaud & Feser, 2010). 

In this paper, it two important points have been 
illustrated by means of both hypothetical and real 
data. These points are the robust coefficient of 
determination and identifying outliers. The robust 
coefficient of determination has shown that outliers 
unduly affect the least squares estimator and that 
the M-estimator may be a suitable alternative to the 
least squares method when data contain an outlier 
or outliers. Because, as shown in the data analysis 
mentioned, the effects of outliers minimized with 
the M-estimate and fitted model has a larger R 2 
value. This means that the proportion of variation 
explained by the independent variables is better 
with fitted model by M-estimator. Moreover, with 
the M-estimator, it was shown that the M-weights 
provides researchers to identify outliers and to be 
able make data analyses without needing to remove 
the outliers from data sets. 

On the other hand, data may contain outliers in x 
and/or y directions in which the M-estimator may 
not be robust in regards to outliers in x direction. In 
such a situation, GM-estimations, which are robust 
to the outliers in both x and y directions, may be 
the more appropriate estimator for this kind of data 
(Arslan & Billor, 1996; Co§kuntuncel, 2010). 


Discussion 

Regression analysis is an important statistical 
tool routinely applied in science. Out of the many 
possible regression techniques, the least squares 
method has been generally adopted due to tradition 
and the ease of computation it provides (Maronna 
et al., 2006; Rousseeuw & Leroy, 1987). However, 
the techniques used and assumptions are of equal 
importance (Huber, 1981). 

To assess the quality of the fit in a multiple linear 
regression, the coefficient of determination, 
or R 2 , despite being a simple tool, is the most 
used by practitioners. Indeed, it is reported in 
most statistical analyzes, and although it is not 
recommended as a final model selection tool, it 
does provide an indication of the suitability of 
the chosen explanatory variables in predicting 
the response. In the classical setting, it is well 
known that the least-squares fit and coefficient of 
determination can be arbitrary and/or misleading 
in the presence of a single outlier. In many applied 
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